38 research outputs found

    Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators

    Get PDF
    In a 2016 survey of 704 National Science Foundation (NSF) Biological Sciences Directorate principal investigators (BIO PIs), nearly 90% indicated they are currently or will soon be analyzing large data sets. BIO PIs considered a range of computational needs important to their work, including high performance computing (HPC), bioinformatics support, multistep workflows, updated analysis software, and the ability to store, share, and publish data. Previous studies in the United States and Canada emphasized infrastructure needs. However, BIO PIs said the most pressing unmet needs are training in data integration, data management, and scaling analyses for HPC-acknowledging that data science skills will be required to build a deeper understanding of life. This portends a growing data knowledge gap in biology and challenges institutions and funding agencies to redouble their support for computational training in biology

    The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences

    Get PDF
    The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant's platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses

    Management, Analyses, and Distribution of the MaizeCODE Data on the Cloud

    Get PDF
    MaizeCODE is a project aimed at identifying and analyzing functional elements in the maize genome. In its initial phase, MaizeCODE assayed up to five tissues from four maize strains (B73, NC350, W22, TIL11) by RNA-Seq, Chip-Seq, RAMPAGE, and small RNA sequencing. To facilitate reproducible science and provide both human and machine access to the MaizeCODE data, we enhanced SciApps, a cloud-based portal, for analysis and distribution of both raw data and analysis results. Based on the SciApps workflow platform, we generated new components to support the complete cycle of MaizeCODE data management. These include publicly accessible scientific workflows for the reproducible and shareable analysis of various functional data, a RESTful API for batch processing and distribution of data and metadata, a searchable data page that lists each MaizeCODE experiment as a reproducible workflow, and integrated JBrowse genome browser tracks linked with workflows and metadata. The SciApps portal is a flexible platform that allows the integration of new analysis tools, workflows, and genomic data from multiple projects. Through metadata and a ready-to-compute cloud-based platform, the portal experience improves access to the MaizeCODE data and facilitates its analysis

    Double triage to identify poorly annotated genes in maize: The missing link in community curation

    Get PDF
    The sophistication of gene prediction algorithms and the abundance of RNA-based evidence for the maize genome may suggest that manual curation of gene models is no longer necessary. However, quality metrics generated by the MAKER-P gene annotation pipeline identified 17,225 of 130,330 (13%) protein-coding transcripts in the B73 Reference Genome V4 gene set with models of low concordance to available biological evidence. Working with eight graduate students, we used the Apollo annotation editor to curate 86 transcript models flagged by quality metrics and a complimentary method using the Gramene gene tree visualizer. All of the triaged models had significant errors-including missing or extra exons, non-canonical splice sites, and incorrect UTRs. A correct transcript model existed for about 60% of genes (or transcripts) flagged by quality metrics; we attribute this to the convention of elevating the transcript with the longest coding sequence (CDS) to the canonical, or first, position. The remaining 40% of flagged genes resulted in novel annotations and represent a manual curation space of about 10% of the maize genome (~4,000 protein-coding genes). MAKER-P metrics have a specificity of 100%, and a sensitivity of 85%; the gene tree visualizer has a specificity of 100%. Together with the Apollo graphical editor, our double triage provides an infrastructure to support the community curation of eukaryotic genomes by scientists, students, and potentially even citizen scientists. © 2019 This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication

    The iPlant Collaborative: Cyberinfrastructure for Plant Biology

    Get PDF
    The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanity's projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services

    Engineering American society: the lesson of eugenics

    No full text
    We stand at the threshold of a new century, with the whole human genome stretched out before us. Messages from science, the popular media, and the stock market suggest a world of seemingly limitless opportunities to improve human health and productivity. But at the turn of the last century, science and society faced a similar rush to exploit human genetics. The story of eugenics - humankind's first venture into a 'gene age' - holds a cautionary lesson for our current preoccupation with genes

    DNA science : a first course in recombinant DNA technology

    No full text
    This book is both an introductory text and a lab manual for a course in recombinant DNA technology. The topics of the eight chapters of text include background, basic and advanced tools and techniques, gene regulation in development, and applications. Topics of some of the ten highly detailed lab exercises are bacterial culture techniques, DNA restriction analysis, recombination of antibiotic-resistant genes, and replica plating in identification of bacterial populations. Appendices include recipes for media and solutions. The text closes with a bibliography and a combination glossary/index

    Essays on science and society. Lessons from a science education portal

    No full text
    Doing your best on the Web requires attending to search engines, answering hard questions, and making cybertools accessible to a broad audience

    Talented students and motivated teachers: an interactive and synergistic tandem to design innovative hands-on learning practices in biosciences : CusMiBio and the City DNA Barcode Project in Italy

    No full text
    CusMiBio, Centre of the University and School of Milan for Bioscience education is a project launched in 2004 by the University of Milan and Lombardy educational office to improve science education in high schools. Together with activities directed to all students from all types of high schools, CusMiBio has developed actions dedicated to the most gifted students. The City Barcode Project (CBP), is an \u201cauthentic\u201d research project that engages talented students and their teachers in original research to study biological diversity in their urban environment. In the CusMiBio project, modelled on infrastructure and programmes developed at the DNA Learning Centre (DNALC) of Cold Spring Harbor Laboratory, New York, teams of students and teachers were invited to analyse the diversity of other living things and products in their parks, rivers, homes, restaurants, and stores using DNA barcoding, a new approach now applied to the discovery, cataloging and monitoring of biodiversity and to the objective identification of animal and plant species. Each group engaged in the CBP underwent all the experimental steps, i) field activities (samples collection), ii) experimental activities (DNA extraction, PCR and sequencing, in CusMiBio labs), iii) bioinformatics analyses (in CusMiBio bioinformatics labs with the support of CusMiBio staff for the access to the barcode database through a simplified bioinformatic platform) and iv) preparation of a short report and a poster, to be presented at the annual Researchers' Night event in Milan. Science education is traditionally accomplished in the context of \u201ccanned\u201d labs with known outcomes. The CBP is an open-ended research project to engage the most motivated and talented students in all aspects of scientific inquiry and in the \u201cprocess\u201d of science
    corecore